Solving Factored MDPs with Exponential-Family Transition Models

نویسندگان

  • Branislav Kveton
  • Milos Hauskrecht
چکیده

Markov decision processes (MDPs) with discrete and continuous state and action components can be solved efficiently by hybrid approximate linear programming (HALP). The main idea of the approach is to approximate the optimal value function by a linear combination of basis functions and optimize it by linear programming. In this paper, we extend the existing HALP paradigm beyond the mixture of beta transition model. As a result, we permit modeling of other transition functions, such as normal and gamma densities, without approximating them. To allow for efficient solutions to the expectation terms in HALP, we identify a rich class of conjugate basis functions. Finally, we demonstrate the generalized HALP framework on a rover planning problem, which exhibits continuous time and resource uncertainty. Introduction Space exploration and problems arising in this domain have been a very important source of applied AI research in recent years. The design of a planning module for an autonomous Mars rover is one of the challenging problems. Along these lines, Bresina et al. (2002) outlined requirements for such a planning system. These include the ability to plan in continuous time, with concurrent actions, using limited resources, and all these in the presence of uncertainty. In the same paper, Bresina et al. (2002) described a simplified rover planning problem, which exhibits some of these characteristics. In this work, we show how to adapt approximate linear programming (ALP) (Schweitzer & Seidmann 1985) to address these types of problems. Our paper centers around hybrid ALP (HALP) (Guestrin, Hauskrecht, & Kveton 2004), which is an established framework for solving large factored MDPs with discrete and continuous state and action variables. The main idea of the approach is to approximate the optimal value function by a linear combination of basis functions and optimize it by linear programming (LP). The combination of factored reward and transition models with the linear value function approximation permits the scalability of the approach. The existing HALP framework (Guestrin, Hauskrecht, & Kveton 2004; Hauskrecht & Kveton 2004) imposes a restriction on solved problems. Every continuous variable must be Copyright c © 2006, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. bounded on the [0, 1] interval and all transition functions are given by a mixture of beta distributions. Different transition models, such as normal distributions, cannot be used directly and have to be approximated. In this work, we alleviate this assumption and allow exponential-family transition models. The paper is structured as follows. First, we introduce hybrid factored MDPs (Guestrin, Hauskrecht, & Kveton 2004) and extend them by exponential-family transition functions. Second, we generalize HALP to solve the new class of problems efficiently. Third, we propose a rich class of conjugate basis functions that lead to closed-form solutions to the expectation terms in HALP. Finally, we demonstrate the HALP framework on an autonomous rover planning problem. Generalized hybrid factored MDPs Discrete-state factored MDPs (Boutilier, Dearden, & Goldszmidt 1995) permit a compact representation of stochastic decision problems by exploiting their structure. In this section, we introduce a new formalism for representing hybrid factored MDPs with an exponential-family transition model. This formalism is based on the HMDP framework (Guestrin, Hauskrecht, & Kveton 2004) and generalizes its mixture of beta transition model for continuous variables. A hybrid factored MDP with an exponential-family transition model (HMDP) is a 4-tuple M=(X,A, P,R), where X = {X1, . . . , Xn} is a state space characterized by a set of state variables, A = {A1, . . . , Am} is an action space represented by action variables, P (X | X,A) is an exponentialfamily transition model of state dynamics conditioned on the preceding state and action choice, and R is a reward model assigning immediate payoffs to state-action configurations.1 State variables: State variables are either discrete or continuous. The state of the system is observed and described by a vector of value assignments x = (xD,xC) which partitions along its discrete and continuous components xD and xC . Action variables: The action space is distributed and represented by action variables A. The composite action is given by a vector of individual action choices a = (aD,aC) which partitions along its discrete and continuous components aD General state and action space MDP is an alternative name for a hybrid MDP. The term hybrid does not refer to the dynamics of the model, which is discrete-time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Smoothness of Linear Value Function Approximations

Markov decision processes (MDPs) with discrete and continuous state and action components can be solved efficiently by hybrid approximate linear programming (HALP). The main idea of the approach is to approximate the optimal value function by a set of basis functions and optimize their weights by linear programming. It is known that the solution to this convex optimization problem minimizes the...

متن کامل

Model Reduction Techniques for Computing

We present a method for solving implicit (factored) Markov decision processes (MDPs) with very large state spaces. We introduce a property of state space partitions which we call-homogeneity. Intuitively, an-homogeneous partition groups together states that behave approximately the same under all or some subset of policies. Borrowing from recent work on model minimization in computer-aided soft...

متن کامل

Model Reduction Techniques for Computing ApproximatelyOptimal Solutions for Markov Decision

We present a method for solving implicit (factored) Markov decision processes (MDPs) with very large state spaces. We introduce a property of state space partitions which we call-homogeneity. Intuitively, an-homogeneous partition groups together states that behave approximately the same under all or some subset of policies. Borrowing from recent work on model minimization in computer-aided soft...

متن کامل

Incremental Structure Learning in Factored MDPs with Continuous States and Actions

Learning factored transition models of structured environments has been shown to provide significant leverage when computing optimal policies for tasks within those environments. Previous work has focused on learning the structure of factored Markov Decision Processes (MDPs) with finite sets of states and actions. In this work we present an algorithm for online incremental learning of transitio...

متن کامل

Structured Possibilistic Planning Using Decision Diagrams

Qualitative Possibilistic Mixed-Observable MDPs (πMOMDPs), generalizing π-MDPs and π-POMDPs, are well-suited models to planning under uncertainty with mixed-observability when transition, observation and reward functions are not precisely known and can be qualitatively described. Functions defining the model as well as intermediate calculations are valued in a finite possibilistic scale L, whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006